BUSINESS QUESTIONS: Note: ignore the fact there is no question 1,
this is due to a formatting error 2. a) Is there are a trend in injuries
By Region, and is there different peaks of times of year per region? b)
Analysis on the types of animals that are injured, this also by Region –
is there a species that is more liable to injury in certain regions? -
(cat/dog by region) c) What is the outcome? Does this differ by
region?
- Total call volume for complaint calls: How has this trended over
time?
- Is there a particular animal being called about the most?
- Do particular suburbs have different type of complaint calls? Do
they call about different animals? ((MAKE A LEAFLET MAP FOR THIS!))
- Business Intelligence – using the insights you have found, can you
predict how this might look for the upcoming year?
library(tidyverse)
library(tsibble)
Attaching package: ‘tsibble’
The following object is masked from ‘package:lubridate’:
interval
The following objects are masked from ‘package:base’:
intersect, setdiff, union
library(forecast)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
This is forecast 8.20
Stackoverflow is a great place to get help on R issues:
http://stackoverflow.com/tags/forecasting+r.
source("cleaning_script.R")
Rows: 31330 Columns: 7── Column specification ─────────────────────────────────────────────────────────
Delimiter: ","
chr (6): nature, animal_type, category, suburb, date_range, city
lgl (1): responsible_office
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Warning: Expected 2 pieces. Missing pieces filled with `NA` in 5643 rows [7, 11, 12, 18, 19, 25, 27, 29, 30, 31, 39, 44, 52, 54, 66, 67, 74, 75, 78, 81, ...].Rows: 42413 Columns: 5── Column specification ─────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Animal Type, Complaint Type, Date Received, Suburb, Electoral Division
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Rows: 664 Columns: 12── Column specification ─────────────────────────────────────────────────────────
Delimiter: ","
chr (2): animal_type, outcome
dbl (10): year, ACT, NSW, NT, QLD, SA, TAS, VIC, WA, Total
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Note; summer in Australia seasons: Summer: December - February
Autumn: March - May Winter: June - August Spring: September -
November
also: The ‘wet season’ in Australia’s North: November - April
https://www.mdpi.com/2076-2615/8/7/100 will useful
reading for later, talk about the need to reduce euthanasia or similar
and the effect this has on people
Intro: Introduce the point of the talk, talk about Australian RSPCA,
talk about how the data was gathered (using the information from the
websites), the PURPOSE of this investigation (which is to help the RSPCA
know which areas/animals to focus their efforts on), as I’m introducing
the datasets I can introduce the two different cities. First, we can
talk a short bit about Australia as a whole, it’s climate, the kind of
animals etc. The point here is to really set the scene before diving too
deep into facts and figures, as especially a non-technical audience this
will help keep them engaged and make a more holistic presentation. In my
opinion, it’s always good to zoom out and see the big picture, rather
than getting lost in the myopia of some csv files.
Townsville Intro: Townsville is a city on the north-eastern coast of
Queensland, Australia. With a population of 180,820 as of June 2018, it
is the largest settlement in North Queensland; it is unofficially
considered its capital. [note: put a map of Australia with Queensland
and Townsville highlighted here. Talk a little bit about the population
density, urbanisation, climate, types of animals that are common here
etc. Show some photos of the area too]
Brisbane Intro: Exact same as above
3.
- Total call volume for complaint calls: How has this trended over
time?
First, let’s look at the Townsville animal complaints.
animal_complaints %>%
group_by(date_received) %>%
summarise(count = n()) %>%
ggplot(aes(x = date_received, y = count)) +
geom_line() +
scale_x_date(date_breaks = "6 months", date_labels = "%b-%y")

This shows some seasonality and also an increase then a decline. Each
summer (in December) the calls are much lower, rising again each Winter.
Using geom_smooth, we get:
animal_complaints %>%
group_by(date_received) %>%
summarise(count = n()) %>%
ggplot(aes(x = date_received, y = count)) +
geom_smooth() +
geom_point() +
scale_x_date(date_breaks = "1 year", date_labels = "%y")

This shows a bit more clearly the general trend of call volume. From
2014 it steadily rises, peaking in 2017. Afterwards, it steadily
declines to only slightly higher than where it started. It would be very
difficult to say whether this trend will continue downward, go upward or
stay relatively flat.
Note; if we have time or if it’s helpful, we will fix this graph so
that quarters are properly displayed.
brisbane_complaints %>%
group_by(date) %>%
summarise(count = n()) %>%
ggplot(aes(x = date, y = count)) +
geom_point() +
geom_line() +
scale_x_date(date_breaks = "3 months", date_labels = "%y")

Again, we see the seasonality of winter having more calls. The fact
this is in both Brisbane and Townsville suggests a fairly general
trend.
brisbane_complaints %>%
group_by(date) %>%
summarise(count = n()) %>%
ggplot(aes(x = date, y = count)) +
geom_point() +
geom_smooth() +
scale_x_date(date_breaks = "1 year", date_labels = "20%y")

Other than declining a little over 2016, the number of calls sees a
slow but steady increase towards 2020, being thousands more than it was
in 2016 and 2017. So, the general trend is that the RSPCA are getting
more complaints as time goes on. Now, this does not necessarily mean the
line will continue to go up. We are also missing Q3 from 2016 so this
skews the curve a little
3.
- Is there a particular animal being called about the most?
animal_complaints %>%
group_by(animal_type) %>%
ggplot(aes(x = animal_type)) +
geom_bar()

The number of calls about dogs dwarf those about cats hugely. Let’s
look at specific numbers:
animal_complaints %>%
group_by(animal_type) %>%
count() %>%
summarise(n, 4094 / 38319)
So cats account for only 10% of the calls dogs do for Townsville!
Let’s look at Brisbane:
brisbane_complaints %>%
group_by(type_of_animal) %>%
count()
brisbane_complaints %>%
group_by(type_of_animal) %>%
ggplot(aes(x = type_of_animal)) +
geom_bar()

We have a lot more animal types here that we don’t in the previous
data. We also have many calls about Attacks with no animal specified,
many of which it is very likely they were involving dogs. However, we
can’t say this for sure.
So that makes
4745 / 13334 # cats are 35% of the calls compared to dogs
[1] 0.3558572
# all other animals divided by dogs, leaving out unspecified (which we believe potentially contain a high proportion of dogs)
9457 / 13349
[1] 0.7084426
Most of the other animals have very small counts, interestingly foxes
seem to make up a decent proportion of the calls regarding wild
animals.
To answer the question however, it’s mainly dogs and cats, and
especially dogs. Dogs are being called about more than any of the other
animals combined (if we leave Unspecified to the side)
(Attack refers to the initial description of the complaint)
3.
- Do particular suburbs have different type of complaint calls? Do
they call about different animals?
OK, so this one is difficult simply for the fact there is a huge
amount of suburbs.
brisbane_complaints %>%
group_by(suburb) %>%
count()
# There is 192 suburbs! Certainly using a fill on a graph is not gonna work, neither is putting them on the x-axis on a bar graph.
animal_complaints %>%
group_by(suburb) %>%
count()
# 85 suburbs for Townsville
animal_complaints %>%
group_by(electoral_division) %>%
count()
# also 11 electoral divisions. Could we look at suburbs one electoral division at a time? Possibly
Idea: What about using leaflet to visualise types of complaint calls
on a map?
This is more feasible with the Townsville datasets, as it has less
suburbs and only 6 types of complaints. It may be necessary to try and
wrangle the Brisbane data a little further to narrow down categories
(for both type_of_animal and complaint_type)
FOR TOWNSVILLE WE HAVE: 85 different suburbs 6 complaint types 2
animal types
We want to break it down by suburb and complaint type, and then by
suburb and animal type
animal_complaints %>%
ggplot(aes(x = suburb, fill = animal_type)) +
geom_bar(position = "fill") +
coord_flip() +
scale_x_discrete(guide = guide_axis(n.dodge = 5))

animal_complaints %>%
ggplot(aes(x = suburb, fill =complaint_type)) +
geom_bar(position = "fill")

#coord_flip() +
#scale_x_discrete(guide = guide_axis(n.dodge = 5))
The challenge again, is having too many suburbs that we can’t get any
useful information out of the graphs
animal_complaints %>%
group_by(suburb) %>%
count(sort = TRUE)
# let's drop any suburbs with less than 200 cases
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), animal_type) %>%
filter(count >= 500) %>%
ggplot(aes(x = suburb, y = count, fill = animal_type)) +
geom_col() +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

The highest count by quite a large margin is unallocated to a
specific suburb.
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), animal_type) %>%
filter(count >= 500) %>%
ggplot(aes(x = suburb, y = count, fill = animal_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

At a glance, there’s no big difference on animal types. All the
suburbs have a large majority of dogs. Could we do a hypothesis test to
see if there is a statistically significant difference?
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), animal_type) %>%
filter(count < 500 & count > 100) %>%
ggplot(aes(x = suburb, y = count, fill = animal_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

Looking at the lower end of the population, we see an outlier.
Townsville City has way more cats than any of the other suburbs. In
fact, it’s almost 50/50!
Now still looking at the Townsville dataset, we’ll break it down by
complaint type:
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count >= 500 & count <= 4000) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

Now for the lower end of the count:
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count < 500 & count > 50) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

I’m not sure much can be gleamed with so many complaint_types. Let’s
try focusing more specifically.
One thing to note here, is the vast variation in tolerance for noise.
Most categories are consistent except this one. Cluden for example has a
small percentage of noise complaints,while Bohle Plains has a huge
percentage. This requires more investigation to figure out the root
cause of this:
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count >= 500) %>%
filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count >= 500) %>%
filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count >= 500) %>%
filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count >= 500) %>%
filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

For the less than 500 greater than 50 groups:
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count < 500 & count > 100) %>%
filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count < 500 & count > 100) %>%
filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count < 500 & count > 100) %>%
filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type) %>%
filter(count < 500 & count > 100) %>%
#filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>%
ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
geom_col(position = "fill") +
coord_flip() +
facet_wrap(~ suburb)
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type, animal_type) %>%
filter(count < 500 & count > 100) %>%
ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
geom_col() +
facet_wrap(~ suburb)
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

From this graph, we can see that Hyde Park has a disproportionate
amount of private impounds, while Bohle plains has more noise
complaints.
Potential hypothesis test:
That the level of private impounds in Hyde Park being greater is
statistically significant.
That the level of noise in Bohle Plains being greater is
statistically significant.
animal_complaints %>%
group_by(suburb) %>%
count() %>%
filter(n < 500 & n > 100)
Let’s look at the suburbs with much higher complaints now:
animal_complaints %>%
group_by(suburb) %>%
summarise(count = n(), complaint_type, animal_type) %>%
filter(count >= 500 & count <4000) %>%
ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
geom_col() +
facet_wrap(~ suburb)
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

2. a) Is there are a trend in injuries By Region, and is there
different peaks of times of year per region?
It’s impossible to say if there’s different peaks of times of yer per
region, as we only have the data for the year as a whole. Unless we look
at Suburb rather than region, and break this down by time. However, I
think it’d be good to use the general Autralia (nationwide) data.
animal_outcomes %>%
group_by(region) %>%
ggplot(aes(x = region, y = number_of_occurences, fill = outcome, col = outcome)) +
geom_col(position = "fill")

animal_outcomes %>%
group_by(region, year) %>%
summarise(count = n())
`summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
---
title: "Analysis"
output: html_notebook
---

BUSINESS QUESTIONS:
Note: ignore the fact there is no question 1, this is due to a formatting error
2. 
a) Is there are a trend in injuries By Region, and is there different peaks of times of year per region?
b) Analysis on the types of animals that are injured, this also by Region – is there a species that is more liable to injury in certain regions? - (cat/dog by region)
c) What is the outcome? Does this differ by region?

3.
a) Total call volume for complaint calls: How has this trended over time?
b) Is there a particular animal being called about the most?
c) Do particular suburbs have different type of complaint calls? Do they call about different animals? ((MAKE A LEAFLET MAP FOR THIS!))

4.
Business Intelligence – using the insights you have found, can you predict how this might look for the upcoming year?

```{r}
library(tidyverse)
library(tsibble)
library(forecast)
source("cleaning_script.R")
```

Note; summer in Australia seasons:
Summer: December - February
Autumn: March - May
Winter: June - August
Spring: September - November

also: The 'wet season' in Australia's North: November - April

https://www.mdpi.com/2076-2615/8/7/100   will useful reading for later, talk about the need to reduce euthanasia or similar and the effect this has on people

Intro:
Introduce the point of the talk, talk about Australian RSPCA, talk about how the data was gathered (using the information from the websites), the PURPOSE of this investigation (which is to help the RSPCA know which areas/animals to focus their efforts on), as I'm introducing the datasets I can introduce the two different cities. First, we can talk a short bit about Australia as a whole, it's climate, the kind of animals etc. The point here is to really set the scene before diving too deep into facts and figures, as especially a non-technical audience this will help keep them engaged and make a more holistic presentation. In my opinion, it's always good to zoom out and see the big picture, rather than getting lost in the myopia of some csv files.

Townsville Intro:
Townsville is a city on the north-eastern coast of Queensland, Australia. With a population of 180,820 as of June 2018, it is the largest settlement in North Queensland; it is unofficially considered its capital. [note: put a map of Australia with Queensland and Townsville highlighted here. Talk a little bit about the population density, urbanisation, climate, types of animals that are common here etc. Show some photos of the area too]

Brisbane Intro:
Exact same as above

### 3.
a) Total call volume for complaint calls: How has this trended over time?

First, let's look at the Townsville animal complaints.
```{r}
animal_complaints %>% 
  group_by(date_received) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date_received, y = count)) +
  geom_line() +
  scale_x_date(date_breaks = "6 months", date_labels = "%b-%y")
```
This shows some seasonality and also an increase then a decline. Each summer (in December) the calls are much lower, rising again each Winter. Using geom_smooth, we get:

```{r}
animal_complaints %>% 
  group_by(date_received) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date_received, y = count)) +
  geom_smooth() +
  geom_point() +
  scale_x_date(date_breaks = "1 year", date_labels = "%y")
```

This shows a bit more clearly the general trend of call volume. From 2014 it steadily rises, peaking in 2017. Afterwards, it steadily declines to only slightly higher than where it started. It would be very difficult to say whether this trend will continue downward, go upward or stay relatively flat.







Note; if we have time or if it's helpful, we will fix this graph so that quarters are properly displayed.
```{r}
brisbane_complaints %>% 
  group_by(date) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date, y = count)) +
  geom_point() +
  geom_line() +
  scale_x_date(date_breaks = "3 months", date_labels = "%y")
```

Again, we see the seasonality of winter having more calls. The fact this is in both Brisbane and Townsville suggests a fairly general trend.

```{r}
brisbane_complaints %>% 
  group_by(date) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date, y = count)) +
  geom_point() +
  geom_smooth() +
  scale_x_date(date_breaks = "1 year", date_labels = "20%y")
```

Other than declining a little over 2016, the number of calls sees a slow but steady increase towards 2020, being thousands more than it was in 2016 and 2017. So, the general trend is that the RSPCA are getting more complaints as time goes on. Now, this does not necessarily mean the line will continue to go up. We are also missing Q3 from 2016 so this skews the curve a little



### 3.
b) Is there a particular animal being called about the most?

```{r}
animal_complaints %>% 
  group_by(animal_type) %>% 
  ggplot(aes(x = animal_type)) +
  geom_bar()
```

The number of calls about dogs dwarf those about cats hugely. Let's look at specific numbers:
```{r}
animal_complaints %>% 
  group_by(animal_type) %>%
  count() %>% 
  summarise(n, 4094 / 38319)
```

So cats account for only 10% of the calls dogs do for Townsville! Let's look at Brisbane:

```{r}
brisbane_complaints %>% 
  group_by(type_of_animal) %>%
  count()

brisbane_complaints %>% 
  group_by(type_of_animal) %>%
  ggplot(aes(x = type_of_animal)) +
  geom_bar()
```

We have a lot more animal types here that we don't in the previous data. We also have many calls about Attacks with no animal specified, many of which it is very likely they were involving dogs. However, we can't say this for sure.

So that makes 
```{r}
4745 / 13334  # cats are 35% of the calls compared to dogs



# all other animals divided by dogs, leaving out unspecified (which we believe potentially contain a high proportion of dogs)
9457 / 13349
```

Most of the other animals have very small counts, interestingly foxes seem to make up a decent proportion of the calls regarding wild animals.

To answer the question however, it's mainly dogs and cats, and especially dogs. Dogs are being called about more than any of the other animals combined (if we leave Unspecified to the side)

(Attack refers to the initial description of the complaint)


### 3. 
c) Do particular suburbs have different type of complaint calls? Do they call about different animals?

OK, so this one is difficult simply for the fact there is a huge amount of suburbs.
```{r}
brisbane_complaints %>% 
  group_by(suburb) %>% 
  count()

# There is 192 suburbs! Certainly using a fill on a graph is not gonna work, neither is putting them on the x-axis on a bar graph.

animal_complaints %>% 
  group_by(suburb) %>% 
  count()
# 85 suburbs for Townsville

animal_complaints %>% 
  group_by(electoral_division) %>% 
  count()
# also 11 electoral divisions. Could we look at suburbs one electoral division at a time? Possibly
```

Idea: What about using leaflet to visualise types of complaint calls on a map?

This is more feasible with the Townsville datasets, as it has less suburbs and only 6 types of complaints. It may be necessary to try and wrangle the Brisbane data a little further to narrow down categories (for both type_of_animal and complaint_type)

FOR TOWNSVILLE WE HAVE:
85 different suburbs
6 complaint types
2 animal types

We want to break it down by suburb and complaint type, and then by suburb and animal type
```{r}
animal_complaints %>% 
  ggplot(aes(x = suburb, fill = animal_type)) +
  geom_bar(position = "fill") +
  coord_flip() +
  scale_x_discrete(guide = guide_axis(n.dodge = 5))
```

```{r}
animal_complaints %>% 
  ggplot(aes(x = suburb, fill =complaint_type)) +
  geom_bar(position = "fill") 
  #coord_flip() +
  #scale_x_discrete(guide = guide_axis(n.dodge = 5))
```

The challenge again, is having too many suburbs that we can't get any useful information out of the graphs

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  count(sort = TRUE)

# let's drop any suburbs with less than 200 cases

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count >= 500) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col() +
  coord_flip()
```

The highest count by quite a large margin is unallocated to a specific suburb.

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count >= 500) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```
At a glance, there's no big difference on animal types. All the suburbs have a large majority of dogs. Could we do a hypothesis test to see if there is a statistically significant difference?

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count < 500 & count > 100) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```
Looking at the lower end of the population, we see an outlier. Townsville City has way more cats than any of the other suburbs. In fact, it's almost 50/50!

Now still looking at the Townsville dataset, we'll break it down by complaint type:
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500 & count <= 4000) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```

Now for the lower end of the count:
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 50) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```

I'm not sure much can be gleamed with so many complaint_types. Let's try focusing more specifically.

One thing to note here, is the vast variation in tolerance for noise. Most categories are consistent except this one. Cluden for example has a small percentage of noise complaints,while Bohle Plains has a huge percentage. This requires more investigation to figure out the root cause of this: 

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```

For the less than 500 greater than 50 groups:
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>%  
  filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  #filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() +
  facet_wrap(~ suburb)
```


```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type, animal_type) %>% 
  filter(count < 500 & count > 100) %>% 
  ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
  geom_col() +
  facet_wrap(~ suburb)
```

From this graph, we can see that Hyde Park has a disproportionate amount of private impounds, while Bohle plains has more noise complaints.

Potential hypothesis test:

That the level of private impounds in Hyde Park being greater is statistically significant.

That the level of noise in Bohle Plains being greater is statistically significant.
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  count() %>% 
  filter(n < 500 & n > 100)
```






Let's look at the suburbs with much higher complaints now:

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type, animal_type) %>% 
  filter(count >= 500 & count <4000) %>% 
  ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
  geom_col() +
  facet_wrap(~ suburb)
```





### 2. a) Is there are a trend in injuries By Region, and is there different peaks of times of year per region?

It's impossible to say if there's different peaks of times of yer per region, as we only have the data for the year as a whole. Unless we look at Suburb rather than region, and break this down by time. However, I think it'd be good to use the general Autralia (nationwide) data.









```{r}
animal_outcomes %>% 
  group_by(region) %>% 
  ggplot(aes(x = region, y = number_of_occurences, fill = outcome, col = outcome)) +
  geom_col(position = "fill")
```


```{r}
animal_outcomes %>% 
  group_by(region, year) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = year, y = count)) +
  #geom_point() +
  geom_line() +
  facet_wrap(~ region)
```



















